Risk factors & cohort studies

Adam La Caze

Semester 1, 2023

Objectives

  • Be able to describe what a risk factor is and how they are used in understanding and managing disease

  • Be able to calculate prevalence and incidence

  • Be able to describe the roles of cohort studies

  • Be able to describe the key sources of bias in cohort studies and how these biases are minimised

Key concepts

  • Risk factors are factors that predict disease, they are not necessarily causal

  • Biomarkers measure biological processes and are used in a large number of ways, including to measure exposure, response to treatment and disease prognosis

  • Restriction is an important method for reducing bias in cohort studies (e.g. restricting to incident (new) users of the intervention under study)

Risk factors

Cohort studies

Key design aspects of a cohort study

Key roles for cohort studies

  1. Measure disease/outcome incidence

  2. Identify risk factors for the disease/outcome

  3. Measure the effect of exposure to a risk factor on the incidence of a disease or outcome

Key terms

Risk

is the probability of an event in a defined period of time

Risk factor

is a characteristic that is associated with an increased risk of the disease or outcome

Prevalence

is the fraction of a group of people possessing a clinical condition at any one time

Incidence

is the fraction of a group of people initially free of disease that develop the condition over a period of time

Framingham Heart Study

  • Cohort study started in 1950 in Framingham, Massachusetts (close to Boston)

  • The study set out to better understand the causes of cardiovascular disease

  • The study started with a sample of people living in Framingham (the population was predominately white and middle class)

  • Participants provided a medical history and had a physical examination, this was repeated at regular intervals

  • Most participants were free from cardiovascular disease at time of recruitment

  • As participants start to develop cardiovascular disease it is possible to assess risk factors

  • Early key results include: cholesterol, blood pressure, physical activity are risk factors for cardiovascular disease

  • Framingham Heart Study has been including genetic data since 2006

  • Over time a more diverse population was recruited in addition to offspring of the original participants

Cardiac risk calculators

Heart Foundation 5-year cardiovascular risk calculator for people without diabetes

Risk factors

  • Predict disease

  • May or may not be causal

  • May or may not be modifiable (e.g. gender and cardiovascular disease)

  • Can have a long latency period (e.g. the effects of high cholesterol on coronary heart disease takes decades)

  • Often lead to a small increase in risk

  • Often combine to increase risk; multiple risk factors for a single disease (e.g. hypertension, smoking and male gender combine to increase risk of coronary heart disease)

  • Single risk factors can lead to an increase in multiple diseases (e.g. hypertension increases risk of stroke, heart failure and coronary heart disease)

Risk v causation

Imagine $Z$ is alcohol intake, $X$ is a cigarette smoking exposure (e.g. pack-per-day) and $Y$ is lung cancer rates. Path **A** and **B** represent different causal assumptions regarding how these variables are related.

Biomarker

Biomarker

A defined characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention, including therapeutic interventions. Molecular, histologic, radiographic, or physiologic characteristics are types of biomarkers. A biomarker is not an assessment of how an individual feels, functions, or survives.

FDA-NIH Biomarker Working Group (2016)

Types of biomarker

Biomarker type FDA definition Example
Susceptibility/risk biomarker A biomarker that indicates the potential for developing a disease or medical condition in an individual who does not currently have clinically apparent disease or the medical condition. LDL and cardiovascular disaease
Diagnostic biomarker A biomarker used to detect or confirm presence of a disease or condition of interest or to identify individuals with a subtype of the disease. Troponin and acute coronary syndrome
Monitoring biomarker A biomarker measured repeatedly for assessing status of a disease or medical condition or for evidence of exposure to (or effect of) a medical product or an environmental agent. HbA1c and T2DM
Prognostic biomarker A biomarker used to identify likelihood of a clinical event, disease recurrence or progression in patients who have the disease or medical condition of interest. Hypertension and secondary cardiovascular risk
Biomarker type FDA definition Example
Predictive biomarker A biomarker used to identify individuals who are more likely than similar individuals without the biomarker to experience a favorable or unfavorable effect from exposure to a medical product or an environmental agent. Thiopurine methyltransferase genotype prior to treatment with 6-mercaptopurine
Response biomarker A biomarker used to show that a biological response, potentially beneficial or harmful, has occurred in an individual who has been exposed to a medical product or an environmental agent; includes pharmacodynamic biomarkers and surrogate endpoint biomarkers. HbA1c, blood pressure, LDL in cardiovascular disease
Safety biomarker A biomarker measured before or after an exposure to a medical product or an environmental agent to indicate the likelihood, presence, or extent of toxicity as an adverse effect. Serum potassium levels in patients taking ACE inhibitors

Types of surrogate endpoint biomarkers

Validated surrogate endpoint

an endpoint supported by clear mechanistic rationale and clinical data providing strong evidence that an effect on the surrogate endpoint predicts specific clinical benefit

Reasonably likely surrogate endpoint

an endpoint supported by strong mechanistic and/or epidemiologic rationale such that an effect on the surrogate endpoint is expected to be correlated with an endpoint intended to assess clinical benefit in clinical trials, but without sufficient clinical data to show that it is a validated surrogate endpoint.

Candidate surrogate endpoint

an endpoint that is still under evaluation for its ability to predict clinical benefit

Assessing biomarkers

Analytical validity:

does the method for measuring the biomarker accurately measure the biomarker?

Clinical validity:

does the biomarker accurately measure or predict the clinical endpoint/outcome?

Clinical utility:

does use of the biomarker to inform care improve clinical outcomes?

Expressing risk

Prevalence and Incidence

Prevalence

number of cases/total population at a specified time

Incidence

number of new cases/total population at risk per unit of time

Where: ‘at risk’ = ‘free from disease or condition’

Example: prevalence and incidence

  • Occurrence of lung cancer in 10,000 people at risk of disease over a three year period
  • All participants who ceased to have lung cancer died.
  • 4 people had lung cancer at the start of the study
  • 16 people developed lung cancer during the study

Prevalence: fraction of participants with condition at any given point in time

  • Start of 2010 4/10 000

  • End of 2010: 5/9 996

Incidence: fraction of participants initially free of condition who develop the condition over a given time

  • During the 3-year study: 16/9 996 (4 participants had lung cancer at the start of the study)

  • During 2012: 5/9 985 (15 participants had lung cancer prior to the start of 2012)

Incidence density

Incidence density

number of new cases/number of person years at risk of exposure

Incidence density accounts for different follow-up times

  • A cohort study seeks to determine incidence density of dementia in an elderly population

  • The study goes for 2 years, but recruitment occurs over time

  • 50 participants enrol in year 1 and 50 participants enrol in year 2

  • There are 8 new cases over the two years.

Number of new cases = 8
Number of person years at risk of exposure = (50 x 2) + (50 x 1) = 150 person years
Incidence density = 8/150 person years = 5.3 per 100 person years

Attributable risk and relative risk

\(I_E\): incidence of the disease in the exposed group

\(I_{NE}\): incidence of the disease in the non-exposed group

Attributable risk

\(I_E - I_{NE}\)

Relative Risk

\(\frac{I_E}{I_{NE}}\)

  • A study assesses death rate from lung cancer among smokers and non-smokers

  • Death rate (smokers), \(I_E\) = 341.1 per 100 000 person years

  • Death rate (non-smokers), \(I_{NE}\) = 14.7 per 100 000 person years

Relative risk = \(I_E/I_{NE} = 341.1/14.7 = 23.2 = 2320\%\)
Attributable risk = \(I_E - I_{NE} = 341.1 - 14.7 = 326.6\) per 100 000 person years

Bias and cohort studies

Example: survivor bias

We want to compare an adverse effect from patients taking empagliflozin (new drug) with patients taking metformin (old drug) using registry data.

One of the challenges we will need to contend with is that current users of the new drug are very different to current users of the old drug.

Current users of empagliflozin have started the drug in the past 1–2 years.

In the metformin group, some of these patients might have been taking metformin for 5–10 years (or longer). Patients who didn’t tolerate metformin, or didn’t adequately respond are less likely to be represented within these users compared to the empagliflozin group.

This can introduce a form of survivor bias.

Example: immortal time bias

When comparing “treated” and “untreated” patients in a dataset, we need a way to determine who we will consider as “treated”.

For example, an individual might be considered “treated” if they received a prescription for the drug in the previous 12 months.

(ref:itbcap)

Example: confounding by indication

We want to assess whether proton pump inhibitors increase risk of pneumonia. We have a dataset that includes medicines prescribed and diagnoses and we observe a correlation between proton pump inhibitor use and pneumonia

It might be that proton pump inhibitors increase risk of pneumonia.

Alternatively, it might be that early symptoms of pneumonia are being treated with proton pump inhibitors.

To the extent that the correlation is explained by the treatment of symptoms related to pneumonia, this is an example of confounding by indication

New user/incident user design in cohort studies

An important method that can reduce bias in cohort studies is the new-user (or incidence-user) design.

In these cohort studies, the participant group is restricted to new users.

To the extent this can be achieved it will reduce survivor bias, immortal time bias and confounding by indication.

Methods to reduce confounding

Methods to reduce confounding

Random allocation

of participants to treatment

Restriction

limits the range of characteristics of patients in the study (e.g. active comparator, new user design)

Matching

participants in one group with another group so that they share comparable characteristics

Stratification

compares rates within subgroups (strata) with otherwise similar probability of the outcome

Multivariable adjustment

adjusts for differences in a large number of factors related to the outcome using modelling techniques

Critical appraisal

  1. Does the study answer a focused research question?

  2. Was there a representative and well-defined sample?

  3. Was there an inception cohort?

  4. Was the exposure measured accurately?

  5. Were the outcomes measured accurately? (objective, subjective, masking)

  6. Were important prognostic factors considered?

  7. Were the follow-up of participants sufficiently long and complete?

References

FDA-NIH Biomarker Working Group. (2016). BEST (Biomarkers, EndpointS, and other Tools) Resource. https://doi.org/10.1164/rccm.201301-0153OC